24 research outputs found

    Handwritten Text Line Detection and Classification based on HMMs

    Full text link
    [ES] En este trabajo presentamos una forma para realizar el análisis y la detección de líneas de texto en documentos manuscritos basada en los Modelos Ocultos de Markov, una técnica ampliamente utilizada en otras tareas del reconocimiento del texto manuscrito y del habla. Mostamos que el análisis y la detección de líneas de texto puede realizarse utilizando metodologías más formales en contraposición a los métodos heurístics que se pueden encontrar en la literatura. Nuestro método no solo proporciona las mejores coordenas de posición para cada una de las regiones verticales de la página sino que también las etiqueta, de esta manera superando los métodos heurísticos tradicionales. En nuestros experimentos demonstramos el rendimiento de nuestro método ( tanto en detección como en classificación de líneas) y estudiamos el impacto de incrementalmente restringidos "lenguajes de estructuración vertical de páginas" y modelos morfológicos sobre la precisión de detección y clasificación. Mediante esta experimentación también demostramos la mejora en calidad de las líneas base generadas por nuestro método en comparación con un método heurístico estado del arte basado en perfiles de proyección vertical.[EN] In this paper we present an approach for text line analysis and detection in handwritten documents based on Hidden Markov Models, a technique widely used in other handwritten and speech recognition tasks. It is shown that text line analysis and detection can be solved using a more formal methodology in contraposition to most of the proposed heuristic approaches found in the literature. Our approach not only provides the best position coordinates for each of the vertical page regions but also labels them, in this manner surpassing the traditional heuristic methods. In our experiments we demonstrate the performance of the approach (both in line analysis and detection) and study the impact of increasingly constrained ¿vertical layout language models¿ and morphologic models on text line detection and classification accuracy. Through this experimentation we also show the improvement in quality of the baselines yielded by our approach in comparisonwith a state-of-the-art heuristic method based on vertical projection profiles.Bosch Campos, V. (2012). Handwritten Text Line Detection and Classification based on HMMs. http://hdl.handle.net/10251/17964Archivo delegad

    Advances in Document Layout Analysis

    Full text link
    [EN] Handwritten Text Segmentation (HTS) is a task within the Document Layout Analysis field that aims to detect and extract the different page regions of interest found in handwritten documents. HTS remains an active topic, that has gained importance with the years, due to the increasing demand to provide textual access to the myriads of handwritten document collections held by archives and libraries. This thesis considers HTS as a task that must be tackled in two specialized phases: detection and extraction. We see the detection phase fundamentally as a recognition problem that yields the vertical positions of each region of interest as a by-product. The extraction phase consists in calculating the best contour coordinates of the region using the position information provided by the detection phase. Our proposed detection approach allows us to attack both higher level regions: paragraphs, diagrams, etc., and lower level regions like text lines. In the case of text line detection we model the problem to ensure that the system's yielded vertical position approximates the fictitious line that connects the lower part of the grapheme bodies in a text line, commonly known as the baseline. One of the main contributions of this thesis, is that the proposed modelling approach allows us to include prior information regarding the layout of the documents being processed. This is performed via a Vertical Layout Model (VLM). We develop a Hidden Markov Model (HMM) based framework to tackle both region detection and classification as an integrated task and study the performance and ease of use of the proposed approach in many corpora. We review the modelling simplicity of our approach to process regions at different levels of information: text lines, paragraphs, titles, etc. We study the impact of adding deterministic and/or probabilistic prior information and restrictions via the VLM that our approach provides. Having a separate phase that accurately yields the detection position (base- lines in the case of text lines) of each region greatly simplifies the problem that must be tackled during the extraction phase. In this thesis we propose to use a distance map that takes into consideration the grey-scale information in the image. This allows us to yield extraction frontiers which are equidistant to the adjacent text regions. We study how our approach escalates its accuracy proportionally to the quality of the provided detection vertical position. Our extraction approach gives near perfect results when human reviewed baselines are provided.[ES] La Segmentación de Texto Manuscrito (STM) es una tarea dentro del campo de investigación de Análisis de Estructura de Documentos (AED) que tiene como objetivo detectar y extraer las diferentes regiones de interés de las páginas que se encuentran en documentos manuscritos. La STM es un tema de investigación activo que ha ganado importancia con los años debido a la creciente demanda de proporcionar acceso textual a las miles de colecciones de documentos manuscritos que se conservan en archivos y bibliotecas. Esta tesis entiende la STM como una tarea que debe ser abordada en dos fases especializadas: detección y extracción. Consideramos que la fase de detección es, fundamentalmente, un problema de clasificación cuyo subproducto son las posiciones verticales de cada región de interés. Por su parte, la fase de extracción consiste en calcular las mejores coordenadas de contorno de la región utilizando la información de posición proporcionada por la fase de detección. Nuestro enfoque de detección nos permite atacar tanto regiones de alto nivel (párrafos, diagramas¿) como regiones de nivel bajo (líneas de texto principalmente). En el caso de la detección de líneas de texto, modelamos el problema para asegurar que la posición vertical estimada por el sistema se aproxime a la línea ficticia que conecta la parte inferior de los cuerpos de los grafemas en una línea de texto, comúnmente conocida como línea base. Una de las principales aportaciones de esta tesis es que el enfoque de modelización propuesto nos permite incluir información conocida a priori sobre la disposición de los documentos que se están procesando. Esto se realiza mediante un Modelo de Estructura Vertical (MEV). Desarrollamos un marco de trabajo basado en los Modelos Ocultos de Markov (MOM) para abordar tanto la detección de regiones como su clasificación de forma integrada, así como para estudiar el rendimiento y la facilidad de uso del enfoque propuesto en numerosos corpus. Así mismo, revisamos la simplicidad del modelado de nuestro enfoque para procesar regiones en diferentes niveles de información: líneas de texto, párrafos, títulos, etc. Finalmente, estudiamos el impacto de añadir información y restricciones previas deterministas o probabilistas a través de el MEV propuesto que nuestro enfoque proporciona. Disponer de un método independiente que obtiene con precisión la posición de cada región detectada (líneas base en el caso de las líneas de texto) simplifica enormemente el problema que debe abordarse durante la fase de extracción. En esta tesis proponemos utilizar un mapa de distancias que tiene en cuenta la información de escala de grises de la imagen. Esto nos permite obtener fronteras de extracción que son equidistantes a las regiones de texto adyacentes. Estudiamos como nuestro enfoque aumenta su precisión de manera proporcional a la calidad de la detección y descubrimos que da resultados casi perfectos cuando se le proporcionan líneas de base revisadas por humanos.[CA] La Segmentació de Text Manuscrit (STM) és una tasca dins del camp d'investigació d'Anàlisi d'Estructura de Documents (AED) que té com a objectiu detectar I extraure les diferents regions d'interès de les pàgines que es troben en documents manuscrits. La STM és un tema d'investigació actiu que ha guanyat importància amb els anys a causa de la creixent demanda per proporcionar accés textual als milers de col·leccions de documents manuscrits que es conserven en arxius i biblioteques. Aquesta tesi entén la STM com una tasca que ha de ser abordada en dues fases especialitzades: detecció i extracció. Considerem que la fase de detecció és, fonamentalment, un problema de classificació el subproducte de la qual són les posicions verticals de cada regió d'interès. Per la seva part, la fase d'extracció consisteix a calcular les millors coordenades de contorn de la regió utilitzant la informació de posició proporcionada per la fase de detecció. El nostre enfocament de detecció ens permet atacar tant regions d'alt nivell (paràgrafs, diagrames ...) com regions de nivell baix (línies de text principalment). En el cas de la detecció de línies de text, modelem el problema per a assegurar que la posició vertical estimada pel sistema s'aproximi a la línia fictícia que connecta la part inferior dels cossos dels grafemes en una línia de text, comunament coneguda com a línia base. Una de les principals aportacions d'aquesta tesi és que l'enfocament de modelització proposat ens permet incloure informació coneguda a priori sobre la disposició dels documents que s'estan processant. Això es realitza mitjançant un Model d'Estructura Vertical (MEV). Desenvolupem un marc de treball basat en els Models Ocults de Markov (MOM) per a abordar tant la detecció de regions com la seva classificació de forma integrada, així com per a estudiar el rendiment i la facilitat d'ús de l'enfocament proposat en nombrosos corpus. Així mateix, revisem la simplicitat del modelatge del nostre enfocament per a processar regions en diferents nivells d'informació: línies de text, paràgrafs, títols, etc. Finalment, estudiem l'impacte d'afegir informació i restriccions prèvies deterministes o probabilistes a través del MEV que el nostre mètode proporciona. Disposar d'un mètode independent que obté amb precisió la posició de cada regió detectada (línies base en el cas de les línies de text) simplifica enormement el problema que ha d'abordar-se durant la fase d'extracció. En aquesta tesi proposem utilitzar un mapa de distàncies que té en compte la informació d'escala de grisos de la imatge. Això ens permet obtenir fronteres d'extracció que són equidistants de les regions de text adjacents. Estudiem com el nostre enfocament augmenta la seva precisió de manera proporcional a la qualitat de la detecció i descobrim que dona resultats quasi perfectes quan se li proporcionen línies de base revisades per humans.Bosch Campos, V. (2020). Advances in Document Layout Analysis [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/138397TESI

    Handwritten Text Recognition for Historical Documents in the tranScriptorium Project

    Full text link
    ""© Owner/Author 2014. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in ACM, In Proceedings of the First International Conference on Digital Access to Textual Cultural Heritage (pp. 111-117) http://dx.doi.org/10.1145/2595188.2595193Transcription of historical handwritten documents is a crucial problem for making easier the access to these documents to the general public. Currently, huge amount of historical handwritten documents are being made available by on-line portals worldwide. It is not realistic to obtain the transcription of these documents manually, and therefore automatic techniques has to be used. tranScriptorium is a project that aims at researching on modern Handwritten Text Recognition (HTR) technology for transcribing historical handwritten documents. The HTR technology used in tranScriptorium is based on models that are learnt automatically from examples. This HTR technology has been used on a Dutch collection from 15th century selected for the tranScriptorium project. This paper provides preliminary HTR results on this Dutch collection that are very encouraging, taken into account that minimal resources have been deployed to develop the transcription system.The research leading to these results has received funding from the European Union’s Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 600707 - tranScriptorium and the Spanish MEC under the STraDa (TIN2012-37475-C02-01) research project.Sánchez Peiró, JA.; Bosch Campos, V.; Romero Gómez, V.; Depuydt, K.; De Does, J. (2014). Handwritten Text Recognition for Historical Documents in the tranScriptorium Project. ACM. https://doi.org/10.1145/2595188.2595193

    Impact of Age on Inflammation-Based Scores among Patients Diagnosed with Stage III Non-Small Cell Lung Cancer

    Get PDF
    [EN] Background: Inflammatory and nutritional indexes are prognostic factors in non-small cell lung cancer (NSCLC). Furthermore, a low grade of chronic inflammation has been described in the older population (inflammaging). We aimed to evaluate the neutrophil-to-lymphocyte ratio (NLR), the Prognostic Nutritional Index (PNI), the advanced lung cancer inflammation index (ALI), the platelet-to-lymphocyte ratio (PLR), and the Glasgow Prognostic Score (GPS) in young and older patients diagnosed with locally advanced NSCLC to determine if significant differences between these groups exist.Methods:We conducted a retrospective study analyzing the impact of age on the NLR, PNI, ALI, PLR, and GPS among patients diagnosed with stage III NSCLC at Hospital Universitario Doctor Peset between 2010 and 2015.Results:We included 124 patients (84 young, 40 older patients). The median hemoglobin level and leukocyte count were lower in the older patients (p= 0.0158 andp= 0.001, respectively). A higher median C-reactive protein level was also found in this group (p= 0.0095). Regarding specific inflammatory indexes, the PNI, comprising inflammatory and nutritional parameters, was lower among the older patients (p= 0.0463). The median NLR, ALI, and PLR were similar in both age groups. Moreover, no differences between the age groups were found in the percentage of patients showing high versus low NLR (cutoff point, 5) or ALI (cutoff point, 18) or in the different GPS groups.Conclusions:The baseline PNI, hemoglobin level, and lymphocyte count were lower among the older patients; furthermore, CRP was higher, possibly, because of a more prominent inflammatory status in older patients with lung cancer. No other immunological or nutritional analytical variables were different between the age groups.Palomar-Abril, V.; Soria-Comes, T.; Tarazona Campos, S.; Martín Ureste, M.; Giner-Bosch, V.; Maestu-Maiques, IC. (2020). Impact of Age on Inflammation-Based Scores among Patients Diagnosed with Stage III Non-Small Cell Lung Cancer. Oncology. 98(8):528-533. https://doi.org/10.1159/000506204S52853398

    A polymorphism at the 3'-UTR region of the aromatase gene defines a subgroup of postmenopausal breast cancer patients with poor response to neoadjuvant letrozole

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Aromatase (<it>CYP19A1</it>) regulates estrogen biosynthesis. Polymorphisms in <it>CYP19A1 </it>have been related to the pathogenesis of breast cancer (BC). Inhibition of aromatase with letrozole constitutes the best option for treating estrogen-dependent BC in postmenopausal women. We evaluate a series of polymorphisms of <it>CYP19A1 </it>and their effect on response to neoadjuvant letrozole in early BC.</p> <p>Methods</p> <p>We analyzed 95 consecutive postmenopausal women with stage II-III ER/PgR [+] BC treated with neoadjuvant letrozole. Response to treatment was measured by radiology at 4<sup>th </sup>month by World Health Organization (WHO) criteria. Three polymorphisms of <it>CYP19A1</it>, one in exon 7 (rs700519) and two in the 3'-UTR region (rs10046 and rs4646) were evaluated on DNA obtained from peripheral blood.</p> <p>Results</p> <p>Thirty-five women (36.8%) achieved a radiological response to letrozole. The histopathological and immunohistochemical parameters, including hormonal receptor status, were not associated with the response to letrozole. Only the genetic variants (AC/AA) of the rs4646 polymorphism were associated with poor response to letrozole (p = 0.03). Eighteen patients (18.9%) reported a progression of the disease. Those patients carrying the genetic variants (AC/AA) of rs4646 presented a lower progression-free survival than the patients homozygous for the reference variant (p = 0.0686). This effect was especially significant in the group of elderly patients not operated after letrozole induction (p = 0.009).</p> <p>Conclusions</p> <p>Our study reveals that the rs4646 polymorphism identifies a subgroup of stage II-III ER/PgR [+] BC patients with poor response to neoadjuvant letrozole and poor prognosis. Testing for the rs4646 polymorphism could be a useful tool in order to orientate the treatment in elderly BC patients.</p

    Guidelines for the use and interpretation of assays for monitoring autophagy (4th edition)1.

    Get PDF
    In 2008, we published the first set of guidelines for standardizing research in autophagy. Since then, this topic has received increasing attention, and many scientists have entered the field. Our knowledge base and relevant new technologies have also been expanding. Thus, it is important to formulate on a regular basis updated guidelines for monitoring autophagy in different organisms. Despite numerous reviews, there continues to be confusion regarding acceptable methods to evaluate autophagy, especially in multicellular eukaryotes. Here, we present a set of guidelines for investigators to select and interpret methods to examine autophagy and related processes, and for reviewers to provide realistic and reasonable critiques of reports that are focused on these processes. These guidelines are not meant to be a dogmatic set of rules, because the appropriateness of any assay largely depends on the question being asked and the system being used. Moreover, no individual assay is perfect for every situation, calling for the use of multiple techniques to properly monitor autophagy in each experimental setting. Finally, several core components of the autophagy machinery have been implicated in distinct autophagic processes (canonical and noncanonical autophagy), implying that genetic approaches to block autophagy should rely on targeting two or more autophagy-related genes that ideally participate in distinct steps of the pathway. Along similar lines, because multiple proteins involved in autophagy also regulate other cellular pathways including apoptosis, not all of them can be used as a specific marker for bona fide autophagic responses. Here, we critically discuss current methods of assessing autophagy and the information they can, or cannot, provide. Our ultimate goal is to encourage intellectual and technical innovation in the field

    Guidelines for the use and interpretation of assays for monitoring autophagy (4th edition)

    Get PDF

    New Head Office of the The Bar Association of Valencia

    Full text link
    La investigación, se desarrolló en tres niveles: urbano, conceptual y material. En el nivel urbano, se reconoció la nueva pieza como sutura interior-exterior, como enlace entre centro y periferia, entre lo viejo y lo nuevo, enfrentando pasado y presente. En el nivel conceptual, se buscó la cualidad espacial interna, frente a la neutralidad, abstracción, y planimetría exterior, del ¿muro plano¿ del centro histórico. En el interior, se experimentó sobre los opuestos: dos cajas-tubo limpias, autónomas, abiertas en sus extremos, y enlazadas por la circulación, frente a la multiplicidad espacial sugerida por visiones cruzadas, transparencias, reflejos, y transgresiones. En un solar profundo, 41x15m, interesó especialmente el valor de la luz, y la materialidad, requiriendo la utilización continua del mecanismo de prueba y error, para concretar el acuerdo de los cuatro materiales: hormigón, metal, madera y vidrio.Research was undertaken at three levels: urban, conceptual and material. At the urban level, the new building was recognised as a link between interior-exterior, centre and outskirts, old and new, where the past and present meet. At the conceptual level, internal spatial quality was sought as opposed to the neutrality and abstraction of the exterior plain wall of the historical centre. On the inside, opposites were confronted: two autonomous, opened-ended lightboxes and linked by the circulation, as opposed to the spatial multiplicity suggested by crossed visions, transparencies, reflections and transgressions. In an area of 41x15m, a special interest in the value of light and matter prevailed, which required a trial and error mechanism until an agreement was reached regarding the materials to be used: concrete, metal, wood and glass.Bosch Reig, I.; Campos González, C.; Corell Farinós, V.; Mas Llorens, V. (2006). Nueva sede del ilustre colegio de abogados de Valencia. Arché. (1):191-198. http://hdl.handle.net/10251/32524191198
    corecore